Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support rlhf #184

Merged
merged 9 commits into from
Dec 22, 2023
Merged

support rlhf #184

merged 9 commits into from
Dec 22, 2023

Conversation

Jian1273
Copy link
Collaborator

  1. Add rm trainer
  2. Add rm data processing method
  3. Add some configuration parameters to support rm

@csunny
Copy link
Collaborator

csunny commented Dec 13, 2023

CI pipeline failed. Use the following command to fix this.

pip install black
black .

@wangzaistone
Copy link
Member

@Jian1273 checks not passed

Co-authored-by: qidanrui <[email protected]>
Co-authored-by: junewgl <[email protected]>
Co-authored-by: wangzaistone <[email protected]>
@qidanrui
Copy link
Collaborator

@wangzaistone Except the black check, did the submitted code passes through the test? If yes, I think we can first merge it and run black by ourselves later on.

@csunny
Copy link
Collaborator

csunny commented Dec 19, 2023

@wangzaistone Except the black check, did the submitted code passes through the test? If yes, I think we can first merge it and run black by ourselves later on.

Yes, But first do CR and tests. @wangzaistone + @junewgl

@junewgl
Copy link
Collaborator

junewgl commented Dec 19, 2023

OK,I'm going to test the code @csunny

@junewgl
Copy link
Collaborator

junewgl commented Dec 19, 2023

OK,I'm going to test the code @csunny

After my testing, the code can run successfully, but it needs to add a data set: oaast_rm_zh.json
image

After running, the weights will be saved in the output directory, as shown below:
image

cc @csunny @wangzaistone @qidanrui

Thanks for your contribution @Jian1273 and I have some suggestions
1.You can consider adding an effect evaluation code for the RM model, such as Microsoft's RLHF code
2.You can take the time to submit the RL training related code to verify the overall RLHF effect.
3.After you submit all the RLHF code, we can test it on the text2sql task together.

@qidanrui
Copy link
Collaborator

I've rerun the CI and the code passes all checks. @junewgl @Jian1273 @csunny @wangzaistone

preprocess_function = preprocess_supervised_dataset
print_function = print_supervised_dataset_example
elif stage == "rm":
print(111111111111111111)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug code remains

Copy link
Member

@wangzaistone wangzaistone Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oushu1zhangxiangxuan1 what 's the detail about the bug and your command ,env ? I have passed to

@qidanrui qidanrui merged commit c1e1d53 into main Dec 22, 2023
6 checks passed
@csunny csunny deleted the rlhf branch February 6, 2024 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants